UMass at TDT 2004

نویسندگان

  • Margaret Connell
  • Ao Feng
  • Giridhar Kumaran
  • Hema Raghavan
  • Chirag Shah
  • James Allan
چکیده

Topic Detection classifies stories into different topics, but HTD requires more than that. Is there any other entities between a story and a topic? [10] views a topic as a structure of inter-related events, which gives us a good hint for this new task. Experiments in [10] show that time locality is a very useful attribute in event organization, and it can also help to solve the complexity problem in TDT2004. The TDT-5 collection contains 407,503 stories in three different languages, and the running time for traditional clustering algorithms, which take , is not acceptable for such a huge collection. Since we know that stories in the same event tend to be close in time, we only need to compare a story to its “local” stories instead of the whole collection. The algorithm we use has two steps, bounded 1-NN for event formation and bounded agglomerative clustering for building the hierarchy. In the first step, all stories in the same original language and from the same source are taken out and time ordered. Stories are processed one by one and each incoming story is compared to a certain number of stories before it. This number is approximately the number of stories in a token file and the value is 100 for the baseline run. If the similarity (cosine similarity of tf-idf term vectors) of the current story and the most similar previous story is larger than a given threshold (0.3 in the baseline run), the current story will be assigned to the event that the most similar previous

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UMass at TDT 2000

We had two thrusts to our research, neither of which was ready to be deployed in this evaluation. We report here on the results from the training data, in all cases explored within the link detection task. In the first direction, we looked more carefully at score normalization across different languages and media types. We found that we could improve results noticeably though not substantially ...

متن کامل

Distinct requirements for Ku in N nucleotide addition at V(D)J- and non-V(D)J-generated double-strand breaks.

Loss or addition of nucleotides at junctions generated by V(D)J recombination significantly expands the antigen-receptor repertoire. Addition of nontemplated (N) nucleotides is carried out by terminal deoxynucleotidyl transferase (TdT), whose only known physiological role is to create diversity at V(D)J junctions during lymphocyte development. Although purified TdT can act at free DNA ends, its...

متن کامل

Tdt-2004: Adaptive Topic Tracking at Maryland

A topic tracking system that combines elements from vector space and language modeling frameworks to compute document scores is described. The model is used for both the traditional TDT topic tracking evaluation design and the new supervised adaptive topic tracking evaluation. Results indicate that supervised adaptation and score normalization should be more closely coupled, and that current te...

متن کامل

Mutational analysis of terminal deoxynucleotidyltransferase-mediated N-nucleotide addition in V(D)J recombination.

The addition of nontemplated (N) nucleotides to coding ends in V(D)J recombination is the result of the action of a unique DNA polymerase, TdT. Although N-nucleotide addition by TdT plays a critical role in the generation of a diverse repertoire of Ag receptor genes, the mechanism by which TdT acts remains unclear. We conducted a structure-function analysis of the murine TdT protein to determin...

متن کامل

Results of the 2003 Topic Detection and Tracking Evaluation

The National Institute of Standards and Technology (NIST) administered the sixth open evaluation of Topic Detection and Tracking (TDT) technologies in November of 2003. The TDT project supports development of technologies that automatically organize eventrelated news stories. The program leverages expertise in core technologies, Automatic Speech Recognition (ASR), Document Retrieval (DR), and M...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004